ately no a priori knowledge to say which is the best model for this
e discussing how the uncertainty of a cluster model structure can
handled when using the K-means clustering algorithm, two
ments can be used for the discussion. Figure 2.28 shows one
where a data set is composed of two clusters. Two clusters have
centre-to-centre (or between cluster) distance in three scenarios
But, two clusters have different within-cluster variances in three
Here a term is defined as the sum of within-cluster variances,
called the within-cluster sum of squares. It was 24,132, 15,831
5 in three panels of Figure 2.28, i.e., panel (a), panel (b) and panel
ectively. The sum of within-cluster variances is also called the
hin-cluster variance. Based on the comparison of these three
can be seen that the discrimination power between two clusters
on the total within-cluster variance. A greater total within-cluster
may lead to a poorer discrimination power between clusters.
, a smaller total within-cluster variance may result in a better
ation power between clusters. Therefore, the data presented in
28(a) may have the poorest discrimination power or the worst
g performance while the data presented in Figure 2.28(b) may
best clustering performance.
(a) (b) (c)
Three scenarios to show the impact of the total within cluster squares on the
performance. The dots stand for the data points and the triangles stand for the
res. ‘Sw’ stands for the total within-cluster variance.
e 2.29 shows another measurement, where two clusters are
d in three panels. In all three panels, two clusters have the same
hin-cluster variance. But, two clusters have different between-